There are 3,400 Census dissemination areas in Metro Vancouver, ranging from 0.01 to 145 km^2 in land areas. This census area accounts for over 955,000 households and 245,000 people.
We extracted some key relevant variables on housing, by dissemination area, such as: * Medium Shelter Value, * Medium Total Household Income, * Average Household Size, and * Population Density
Mouse over the map for values of the variables for each area
You can examine these variables on a series of interactive map below:
Â
While the interactive maps provide us with a general impression about how housing values, income, household size and population density can vary across the Metro Vancouver region, it is difficult draw correlations between housing value and different variables. Let’s look at this closer in the next section of the EDA.
The relationship between population density and housing value is nonlinear. Higher population density areas may have smaller units of housing that are less expensive.
The relationship between housing value and total population in the area seems to be inversely proportional.
Housing value generally increase as the medium household income in the dissemination area increases.
A larger household also correlates with higher housing value, this can be potentially due to the need for more space and a bigger house.
housing_value_model <-
lm(shelter_val_med ~ income_hh_med + hhsize_avg + inv_pop_density + inv_emp_tot,
data = census_data_rev)
summary(housing_value_model, corr = T)
##
## Call:
## lm(formula = shelter_val_med ~ income_hh_med + hhsize_avg + inv_pop_density +
## inv_emp_tot, data = census_data_rev)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1468255 -374222 -117492 251981 3597853
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.285e+05 5.185e+04 -8.264 < 2e-16 ***
## income_hh_med 9.980e+00 4.674e-01 21.353 < 2e-16 ***
## hhsize_avg 8.240e+04 1.958e+04 4.208 2.64e-05 ***
## inv_pop_density -3.461e+06 2.335e+06 -1.482 0.138
## inv_emp_tot 1.300e+08 6.640e+06 19.582 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 590700 on 3298 degrees of freedom
## (98 observations deleted due to missingness)
## Multiple R-squared: 0.2855, Adjusted R-squared: 0.2846
## F-statistic: 329.5 on 4 and 3298 DF, p-value: < 2.2e-16
##
## Correlation of Coefficients:
## (Intercept) income_hh_med hhsize_avg inv_pop_density
## income_hh_med -0.21
## hhsize_avg -0.65 -0.47
## inv_pop_density 0.01 -0.09 0.02
## inv_emp_tot -0.29 -0.11 -0.06 -0.01
housing_value_model <-
lm(shelter_val_med ~ income_hh_med + hhsize_avg + inv_pop_tot + inv_emp_tot,
data = census_data_rev)
summary(housing_value_model, corr = T)
##
## Call:
## lm(formula = shelter_val_med ~ income_hh_med + hhsize_avg + inv_pop_tot +
## inv_emp_tot, data = census_data_rev)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2387131 -374891 -122229 263875 3597562
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.562e+05 5.382e+04 -6.618 4.24e-11 ***
## income_hh_med 1.034e+01 4.724e-01 21.889 < 2e-16 ***
## hhsize_avg 7.469e+04 1.959e+04 3.812 0.000141 ***
## inv_pop_tot -1.370e+08 2.875e+07 -4.766 1.96e-06 ***
## inv_emp_tot 1.730e+08 1.120e+07 15.442 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 588900 on 3298 degrees of freedom
## (98 observations deleted due to missingness)
## Multiple R-squared: 0.2899, Adjusted R-squared: 0.2891
## F-statistic: 336.6 on 4 and 3298 DF, p-value: < 2.2e-16
##
## Correlation of Coefficients:
## (Intercept) income_hh_med hhsize_avg inv_pop_tot
## income_hh_med -0.15
## hhsize_avg -0.65 -0.48
## inv_pop_tot -0.28 -0.19 0.09
## inv_emp_tot 0.06 0.09 -0.11 -0.81
What do you think of these two models? What other variables you think we should be looking at?